1 PSpecteR Description

1.1 Abstract

Visual examination of mass spectrometry data is necessary to assess data quality and to facilitate data exploration. Graphics provide the means to evaluate spectral properties, test alternative peptide/protein sequence matches, prepare annotated spectra for publication, and fine-tune parameters during wet lab procedures. Visual inspection of MS data is hindered by proprietary proteomics visualization software designed for particular workflows and academic software that lack visualization tools. We built PSpecteR, an open-source and interactive R Shiny web application to address these issues, with support for several steps of proteomics data processing, including: reading various mass spectrometry files, running open-source database search tools, labelling spectra with fragmentation patterns, testing post-translational modifications, plotting where identified fragments map to reference sequences, and visualizing algorithmic output and metadata. All figures, tables, and spectra are exportable within one easy-to-use graphical user interface. Our current software provides a flexible and modern R framework to support fast implementation of additional features. The open source code is readily available (https://github.com/EMSL-Computing/PSpecteR), and a PSpecteR Docker container is available for easy local installation.

1.2 Module description

1.3 Docker Design

PSpecteR is comprised of three Docker containers: one for the Shiny app architecture, and two for the peptide database search tools MS-GF+ and MSPathFinderT. These containers share a mounted volume (data) for all file inputs and outputs (black arrows). The Shiny app container communicates with the other containers to start the database searches and return their status (blue arrows).

The MS-GF+ and MSPathFinder containers are built as python flask apps with a redis server in the background and managed by celery tasks. PSpecteR constructs the URL calls to pass parameters and files to the other containers, and then the URL to check the task id of the current running jobs.

2 App Installation

We advise following the DockerHub instructions, as the GitHub versions require users to build the docker containers.

  1. Install Docker Desktop at (https://docker.com/products/docker-desktop)

2.1 From DockerHub

  1. Pull the PSpecteR container.

  2. Pull the MS-GF+ container.

  3. Pull the MSPathFinder container.

  4. Determine the shared data folder between the containers, and launch the docker compose file to start the app.

cd pspecter_container
EXPORT PSPECTER_DATA=/path/to/data/folder/within/pspecter_container
docker-compose up
  1. Open Docker Desktop (1) and then click open in browser for the pspecter_container (2).

2.2 From GitHub

  1. Open the terminal, go to the desired output folder, pull the github repo for PSpecteR and then build the docker container.
git clone https://github.com/EMSL-Computing/PSpecteR
cd pspecter_container
docker build -t pspecter:1.0.0 .
  1. Return to the desired output folder. Then pull and build the container for MS-GF+.
cd ../msgfplus_container
docker build -t msgf:1.0 .
  1. Return to the desired output folder. Then pull and build the container for MSPathFinderT.
cd ../mspathfinder_container
docker build -t mspathfindert:1.0 .

5-6. See 5-6 in the DockerHub instructions.

3 Upload Data

3.1 Page Description

This page allows the user to upload:

  • A required mass spectrometry file (MS): mzML, mzXML, or Thermo Fisher raw

  • An optional protein ID file (ID): mzid

  • An optional protein database file (FA): fasta

You may also choose the test files at the bottom of the “Upload Data” page.

3.2 Page Usage

Click a dropdown menu to set the output directory or upload an MS, ID, or FA file. Use the “Search Folders” button to select a file,

or type in an acceptable file path, which will give a check mark for a correct path or a red X for an incorrect path. Click “Use Path” to load the file, or “Clear Path” to remove it. When typing in a MS file path, if the ID and FA file paths have the same name, directory, and correct extention, they will autofill.

The “App Settings” contains a description mode which loads interactive pop-ups that describe how to use the app. Note that enabling or disabling this toggle will clear any loaded data.

Test files are included under “Use Test Files” and are enabled with a toggle switches.

4 MS & XIC

4.1 Page Description

MS & XIC allows you to see how well a peptide sequence maps to a specific MS scan. Visualize fragmentation patterns, extracted ion chromatograms (XICs), the best identified fragment ion per peptide, and more.

4.2 Plots and Tables

All plots are plotly graphics which allow for zooming, panning, autoscaling, and filtering by category within the legend. All tables are DT tables which can be sorted (multi-sorting is enabled with shift), searched, and subsetted.

MS/MS: Hover over the peaks to get the identified ion name, m/z value, and intensity value. Ion names are broken down into fragment type (N-terminus: a, b, c; C-terminus: x, y, z), charge in superscript, and isotope (M+n). Specific fragments can be selected with the table under ‘Filter Ions.” Colors for ion types are consistent throughout figures: a - green, b - blue, c - purple, x - dark pink, y - red, and z - orange. Select a specific annotated peak with the table in ’Filter Ions’. All peak data can be exported in the ‘7. Export Data’ section with the button ‘Export Peak Data’

Error Map: See the PPM error (how far off the theoretical m/z is from the experimental m/z) per amino acid per ion type. Each square is colored by PPM error where red indicates a positive error and blue a negative error. PPM error is calculated by (exp mz - theo mz / theo mz) * 1e6. Isotopes are by default removed, but can be added under ’4b. Fragment Figures” and the ’Remove Iso: Graphics?” toggle.

XIC: Generate XICs (extracted ion chromatograms) of the intensity vs retention time peaks across all MS1 scans for peaks with the XIC MZ value. Traces can be specified as either isotopes or adjacent charge states. Use the charge state slider to make XICs based on the number of isotopes (i.e. selected isotopes of 0, 1, 2 would result in traces for mz + 0/charge, mz + 1/charge, mz + 2/charge) or based on adjacent charge state (i.e. selected charges of 1, 2, 3 woud result in traces of mz/1, mz/2, mz/3). Retention time and intensity data are drawn collectively from all the MS1 scans in the file.

Previous MS1: View the MS1 precursor spectrum with labelled isotopes and within the isolation window for previous and next precursors. The red lines are indicative of the theoretical m/z value for a given peptide. Hover over boxes display m/z, intensity, isotope, reference intensity listed as a proportion, and the percent difference (exp intensity - theo intensity / exp intensity) * 100%. Matched ions can be filtered by percent difference.

Next MS1: View the next MS1 precursor spectrum with labelled isotopes and within the isolation window for previous and next precursors. The red lines are indicative of the theoretical m/z value for a given peptide. Hover over boxes display m/z, intensity, isotope, reference intensity listed as a proportion, and the percent difference (exp intensity - theo intensity / exp intensity) * 100%. Matched ions can be filtered by percent difference.

Filter Ions: Filter the annotated MS/MS by selecting ions in the filter ions table. Ion data can be exported under ‘7. Export Data’ with the ‘Export Fragment Data’ button.

Scan Metadata Table: View metadata from both MS and ID files. From the MS data, scan number, MS level, retention time, precursor m/z, precursor charge, precursor scan, and activation method are extracted. From the ID data, sequence, protein ID, mass, score, q-value (an adjusted p-value based on false discovery rate - FDR), whether the protein is a decoy (reverse sequence peptide used to calculate FDR), and a protein description. Clicking on a row will trigger all the Scan and XIC visualization. Columns can be added or removed with ‘5. Table Column Settings.” Tables can be searched (dialog boxes) and sorted (arrows). To select a range, like from 3 to 5, type ’3 … 5’ into the search box. Holding shift allows for sorting with multiple categories. Above the scan metadata is the number of peaks for that mass spectra, along with coverage, which is the perecentage of amino acids in the sequence with at least one fragment assigned to it, excluding the first amino acid from the N-terminus direction. The table can be scrolled horizontally if not all the columns fit. All data for this table can be exported in ‘7. Export Data’ with the ‘Export MS & ID Data’.

Annotated Sequence: See the lowest ppm error per N-terminus (a, b, or c) and C-terminus (x, y, or z) ion per amino acid. Hover-over to see the PPM error of each fragment. Post-translational modifications are visualized with an asterisk with their name in the hover-over boxes.

Ion Annotation: Sort peptides by the number of ions identified in the MS/MS scan. All ions with isotopes are displayed.

Ion Barplot: See the number of ions identifed per type. Under ‘4b. Fragment Figures’, isotopes may be removed or included, and the counts can be specified to one unique ion per fragment.

4.3 Parameters

  1. Filter Settings

    • Fragment Type - Choose which ion types to plot. Ions are automatically selected by activation method. HCD - b, y. CID - a, b, y. ETD - b, c, y, z. UVPD - a, b, c, x, y, z.
    • Intensity - Select the minimum permissable value for intensity.
    • Fragment Tolerance - Select how close (+/-) experimental fragmentation m/z values must be to theoretical m/z values.
    • Isotopic Percentage - Select the minimum isotopic percentage. For example, if a fragment has an isotope with m/z values 333.5, 334.5, and 335.5, and isotopic percentages of 80%, 15%, and 5%, respectively, a minimum percentage of 10% will remove the 335.5 m/z isotope.
    • Correlation Score - Select the minimum correlation score value. Note that all peaks will be visualized when ‘ALL’ is selected
  2. Spectra Settings

    • Spectra View: Label Size - Change the size of the labels when annotations are enabled..
    • Spectra View: Set Relative Label Distance - Since labels do not change position based on the x and y-axis, users must select how close or far they would like the labels from the peaks on the annotated spectrum. This is used to create graphics which are easier to read.
    • Spectra View: Annotate Spectrum - For each identified fragment, print the fragment type information by that peak on the spectra.
    • Spectra View: Real-Time Editing - Allow spectra to be edited in real time as features are enabled.
    • Spectra View: Spectra Full Screen - Make spectra full screen.
    • Isotope Settings: Remove Isotopes from the Spectra - Remove isotopes from all graphs, tables, and spectra.
  3. Sequence Settings

    • Test Different Sequence - Sequence must contain at least 2 valid amino acids and no spaces.
    • Apply Sequence - Identify fragments with entered sequence.
    • Restore Sequence - Replace the current sequence with the one identified in the ID file for this scan.
  4. Graphics Settings

    • MS1 Plots Settings: Percentage Difference - Set threshold for the +/- percent difference (measured intensity - expected intensity / expected intensity) * 100% for precursor isotope identification
    • MS1 Plots Settings: Increase Window Size - Increase the window size for the isolation window. Max size is 50 m/z.
    • MS1 Plots Settings: MS1 Full Screen - View MS1 in a large pop up window.
    • MS1 Plots Settings: Next MS1 Full Screen - View the next MS1 in a large pop up window.
    • Fragment Figures: Count per fragment in Bar Plot - Tally the count of ions per fragment (discounting charge/isotope) or all fragments (accounting for charge and isotope if enabled).
    • Fragment Figures: Annotate PTMs in Sequence -
    • Fragment Figures: Remove isotopes in figures - Remove isotopes from the Error Map, Annotated Sequence, and Barplot.
  5. Table Column Settings

    • Select Scan Table Columns - Choose which columns to include in the scan table.
    • Select Fragment Columns - Choose which columns to include in the fragment table: ion type (e.g. a, b, c, etc.), amino acid position in the sequence, amino acid, charge, isotope, m/z, and correlation score.
  6. XIC Settings

    • Set XIC Tolerance (PPM) - Select the minimum tolerance window (how close experimental and theoretical values must be in ppm) for an extracted ion chromatogram
    • Charge State - View XIC traces by isotope or by adjacent charge state.
    • Max Number of Isotopes - Autopopulate the input widget Select Isotopes with 1 to the max number of isotopes.
    • Select Isotopes - Choose the isotopes to visualize.
    • Max Charge States - Autopopulate the select isotope window with 1 to the max charge state for the m/z charge states where m is the mass and z is each charge state.
    • Select Charges - Choose the charge states to visualize.
    • MZ - The precursor m/z is autofilled from the MS file. It can be changed.
    • Charge - The precursor charge is autofilled from the MS file. It can be changed.
  7. Export Images and Data

    • Export Images - Take a snapshot of any plot as is. A pop up window will allow zooming capabilities on some plots. To see your images, click ‘Export Snapshot Images’ in the upper right hand corner of the app.
    • Export Data - Export data from any of the tables and the peak data from a CSV. A pop-up menu will allow for subsetting the data.

5 Vis PTM

5.1 Page Description

Visualize PTM allows you to determine if a specific spectra and sequence pairing would be a better fit based on one or more post-translational modifications. Many PTM combinations can be searched here either by a single manual search or a combinatorial search (Dynamic Mod Search).

5.2 Plots and Tables

For each PTM, an annotated spectrum, error map, annotated sequence, and previous/next MS1 graphics is generated. These visualizations are determined by clicking the PTM Table.

PTM Table: All dynamic and specific mod searches will appear here until the sequence, scan, or search parameters are changed. Export this data with ‘6. Export Data’ as either a CSV or a visually appealing markdown file with spectra for every PTM in this table.

Dynamic Modifications Search: Generate and then search through multiple modifications combinatorially. A table will pop up and allow you to select modifications and either keep (“Keep Selected”) or delete (“Delete Selected”) them. When ready to run, click “Calculate” and a pop-up load bar will let you know when the samples are finished. The pop-up menu can be exited by selecting “Exit.”

Manual Modifications Search: Click amino acids and apply specific modifications either from the Unimod database (which can be modified under “More” and “C. Unimod Glossary”) or as added masses. A correctly added PTM will turn the amino acid red, otherwise a pop-up warning will show up. Click “Calculate” when done, and the new modified sequence will be appended to the PTM table.

5.3 Parameters

  1. Set Sequence

    • Sequence: Sequence must contain at least 2 valid amino acids and no spaces.
    • Restore Sequence: Replace the current sequence with the original.
  2. Set Search Parameters

    • Parameters for ion groups, minimum intensity, PPM tolerance, minimum isotopic percentage, minimum correlation score, and minimum precursor percentage are set in page 2. MS & XIC under “1. Filtering Settings.”
  3. Dynamic Modifications Search

    • Modifications: Select PTMs from the Unimod database, which can be appended under the more tab with ’C. Unimod Glossary.”
    • Clear: Remove selected modifications.
    • Common PTMs: Select commonly annotated post translational modifications.
    • Max Modifications per Sequence: Select the max number of modifications to pick per sequence. Wait times increase exponentially as this number increases.
    • Max Modifications per Peptide: Choose the max number of modifications per peptide. Wait times increase exponentially as this number increases.
    • Calculate: Preform calculations of fragmentation patterns for each of the specified modifications on this spectra-sequence pairing. This will trigger the Dynamic Modifications Search pop-up.
  4. Manual Modifications Search

    • Manual Search: Instead of iterating through all possible combinations of modifications, manual search allows users to apply any modifications to a sequence and get the calculated result of that one theoretical spectra (calculated with the modified sequence) and experimental spectra pairing. Clicking this button will trigger the ’Manual Modifications Search” pop-up.
  5. Take Snapshot

    • Take a snapshot of any plot as is.To see your images, click ‘Export Snapshot Images’ in the upper right hand corner of the app.
  6. Export Data

    • Export Modifications Data: Export all metrics for each modified sequence as a CSV.
    • Export Markdown: Export all metrics and graphics for each modified sequence as a markdown file. It will open in an internet browser.

6 Protein Coverage

6.1 Page Description

Visualize where identified peptides map to literature sequences.

6.2 Plots and Tables

Match: See where every identified peptide sequence maps to the literature protein sequence. Hover-over information reveals the scan number and sequence.

Bar: View the number of times each amino acid in the literature sequence was identified.

Literature Sequence: See the full literature sequence with identified peptide regions in green.

Protein Table: View the number of times each peptide was associated with a protein. Clicking a row of the Protein Table creates all three visualizations above for that selected protein.

6.3 Parameters

  1. Protein Coverage Settings

    • Q-Value Minimum: Determine the minimum adjusted p-value
    • Remove Contaminants: Remove any proteins marked as contaminants from the dataset. Note that decoys will remain, if they pass the Q-Value filter.
  2. Take Image Snapshot

    • Take a snapshot of any plot as is. To see your images, click ‘Export Snapshot Images’ in the upper right hand corner of the app.
  3. Export Data

    • Export Protein Coverage Data: Export the number of instances of each protein identified, along with a description, as a CSV.

8 Additional Plots

8.1 Spectra Metadata

Description: Visualize the metadata from the “Scan” metadata table in “2. MS & XIC.”

Spectra Metadata Graphic: Set x-axis, y-axis, and label coordinates to metadata variables. Each point is a scan number. Values can be filtered by Scan Range and MS Level.

Parameters:

  1. Subset Data

    • Scan Range: Select the minimum and maximum scan numbers to visualize in the metadata plot.
    • MS Level: Select the minimum and maximum MS levels to visualize in the metadata plot.
  2. Select Variables

    • Select X Variables: Choose the x-axis for the metadata plot.
    • Select Y Variables: Choose the y-axis for the metadata plot.
    • Select Labels: Choose how to color the points for the metadata plot.
  3. Take Image Snapshot

    • Take a snapshot of the plot as is. To see your images, click ‘Export Snapshot Images’ in the upper right hand corner of the app.

8.2 ProMex Feature Map

Description: Create an interactive plot from MSPathFinder’s ProMex.exe output.

Pro Mex Feature Map: Visualize features from MS1FT files along with any associated proteins. A static version of this plot is exported by ProMex.exe.

Pro Mex Feature Table: See all the elution, mass, and abundance data from MS1FT.

Parameters:

  1. Upload MS1FT File

    • Upload MS1FT: The loading interface is the same as is described in “1. Upload Data.”
    • Test File: A test file for the ProMex feature map is easily enabled with a switch.
  2. Subset Data

    • Number of Highest Abundance Samples: Takes the top n number of samples ranked high-low by abundance.
    • Number of Lowest Abundance Samples: Takes the bottom n number of samples ranked high-low by abundance.
  3. Protein Annotation

    • Filter by Protein: See features that map to a specific protein or set of proteins.
  4. Take Image Snapshot

    • Take a snapshot of the plot as is. To see your images, click ‘Export Snapshot Images’ in the upper right hand corner of the app.

8.3 Unimod Glossary

Description: Append a locally stored version of the Unimod Glossary of post-translational modifications.

Unimod Glossary Table: View the glossary of post-translational modifications. Note that the glossary can be updated with your own CSV of modifications (title must match those of this table), or with a pop-up menu.

Add Modification Interface: Add a specific modification by changing the name, adding a mass, selecting amino acids that are modified, and building the molecular formula by selecting the element and adding the number of atoms. When the “Add” underneath Number of Atoms is clicked, that specific number of elements will be printed in the molecular formula section. When finished, click “Add” at the bottom of the pop-up window or exit.

Parameters:

  1. Select Columns

    • Select Table Columns: Choose columns to visualize in the datatable.
  2. Append Table

    • Add Modification: Append user-specified modification to the list of possible modifications. The most important fields to specify are the monoisotopic weight and the modified sites.
    • Add Glossary File: Add modification data. User must match input database with the correct columns, and column names. Click ‘Add Modification’ after CSV completes uploading.
  3. Export Data

    • Export Glossary: Export entire glossary as a CSV.
    • Export New Modifications: Export new set of user inputted modifications.

9 Export Modules

9.1 Data

Each data export button automatically names and downloads files once clicked.

9.2 Images

The export image pop-up is also made of 3 main components:

  1. Image Table: Click any image name of the table to see that graphic. Note that all snapshots are stored and automatically numbered and named. This is done so that no images overwrite each other.

  2. Set Image Size: After clicking an image, change its width and height by entering the new values and clicking, “Apply New Size.”

  3. Menu Buttons: “Delete All” removes all images from the table. “Delete Image” removes the selected image from the table. You can export the images as PNG, JPG, or an HTML file to keep plot interactively. Click “Exit” to leave the pop-up box.